-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notify lookup sync of gossip processing results #5722
Notify lookup sync of gossip processing results #5722
Conversation
90c30b3
to
e6c8955
Compare
Squashed commit of the following: commit e6c8955 Author: dapplion <[email protected]> Date: Mon May 6 22:52:48 2024 +0900 Notify lookup sync of gossip processing results
I was thinking about this more. In order to avoid sync having to know anything about block processing results (explicitly) can't we do the following:
Eventually the block will either fail, or be processed. We will eventually get more requests for the chain of blocks and eventually it will succeed because the block will either be failed by the beacon processor, or it will be successful. The issue is around just restarting the chains. To prevent loops, like we wait a slot or something. Just thinking this would be away to avoid explicit communication from the beacon processor. |
For child lookups it a bit tricky. Assume you are bit behind, such that given this chain of blocks:
You have to create a lookup for slot 10, fetch block, attempt process, get unknown parent, fetch 9, etc until 6. If 6 is processing lookup 7 needs some event to make progress. So the beacon processor must tell sync that block 6 has been processed. Otherwise dropping all lookups child of 6 while 6 is processing sounds silly. You must leave child lookups of 6 paused somehow awaiting for their ancestor to process. Sync lookup technically doesn't need to know about the processing result, it would be enough to have a signal to continue childs of 6. But then we have to add a new SyncMessage and have a dedicated handler in block lookups. While it reduces how much information the processor leaks, I don't see how it simplifies things. We already have a Note: The moment that sync lookup checks the chain's processing cache or da_checker we break the separation of concerns between sync lookup and gossip processing. If we are crossing that line, let's implement a proper informational complete fix. Notice that this ignores will happen thousands of times, one per each attestation. Unless we have something like #5706 |
@@ -170,21 +172,20 @@ impl<T: BeaconChainTypes> NetworkBeaconProcessor<T> { | |||
if reprocess_tx.try_send(reprocess_msg).is_err() { | |||
error!(self.log, "Failed to inform block import"; "source" => "rpc", "block_root" => %hash) | |||
}; | |||
if matches!(process_type, BlockProcessType::SingleBlock { .. }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing this match?
It it redundant because we now only process single blocks here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, blobs are processed in process_rpc_blobs
beacon_node/network/src/network_beacon_processor/sync_methods.rs
Outdated
Show resolved
Hide resolved
@mergify queue |
🛑 The pull request has been removed from the queue
|
@mergify queue |
🛑 The pull request has been removed from the queue
|
@mergify refresh |
✅ Pull request refreshed |
@mergify queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at 93e0649 |
General problem
Lookup sync and gossip interact in very racy ways. They interact at all because we want to prevent downloading things via ReqResp that we are already processing. Note that this optimization carries complexity, which we may decide if it's worth it at all.
Consider the image above, we can receive an attestation referencing a block not yet imported into fork-choice at these moments in time:
A
before processing: Easy, download and process the blockB
during processing: Block is already processing, should skip download. However, we must consider:C
while waiting blobs: Block is fully processed but not yet in fork-choice. In this case we don't have to worry about processing failures, but the child lookup issue remains.D
after import: This trigger should never happen. If it does we will download all block components and get aBlockAreadyKnown
error.Issues with current unstable da9d386
^ introduces the optimization to prevent downloads for triggers
B
andC
. However, it fails to explicitly handle both pitfalls listed above. If processing fails, a lookup may get stuck.Also, it creates a spammy situation where a lookup get dropped immediately if the block is still processing, once per attestation
^ Fixed the spammy loop, but now it make the case of processing failure worse. Completed chains remain in a hashset for 60 seconds, so a failing lookup will be blocked from being retried for that period.
Proposed Changes
Sync lookup must be aware of gossip processing results. I think there's no way around that.
The least invasive way to achieve that IMO is:
BlockComponentProcessed
With the above we tackle:
Pitfalls
This PR relies on these assumptions:
availability_checker.has_block(block)
returns true, the block is permanently valid and is missing blobsreqresp_pre_import_cache.contains_key(block)
returns true, aBlockComponentProcessed
event MUST be emitted some time in the futureWe don't have e2e to assert those conditions at the moment. I think code comments should be good for now but it would be great to codify them in the future somehow
Todo
Closes #5693